Stock Price Forecasting¶

Exploring the Time Series Problem of stock market forecasting.

All Datasets available @

BTC, AAPL, MSFT, TSLA, ^IXIC(NASDAQ), ^BVSP(IBOVESPA):
https://finance.yahoo.com/

S&P 500:
https://www.kaggle.com/datasets/andrewmvd/sp-500-stocks?select=sp500_index.csv

Group Members:

200028880 - Wallace Ben Teng Lin Wu

222011561 - Mateus Elias de Macedo

222011525 - Erick Hideki Taira

221029051 - Rodrigo Marques Maranhao

Preprocessing¶

In [2]:
import pandas as pd
import datetime

def str_to_datetime(s):
    """ Converts a string object to the respective datetime object"""

    year, month, day = [int(i) for i in s.split('-')]
    return datetime.datetime(year=year, month=month, day=day)


price_dict = {
    "Adj Close" : "Price",
    "S&P500" : "Price",
}

def load_df(filename):
    """
    Create a pandas dataframe, filter to leave only the Price column,
    convert date to datetime and make it the index
    """

    df = pd.read_csv(filename)
    df.rename(columns = price_dict, inplace = True)

    # Univariate analysis
    df = df[["Date", "Price"]]

    # Convert date type objects to datetime object
    df["Date"] = df["Date"].apply(str_to_datetime)

    # Turn "Date" Column into dataframe index
    df.index = df.pop("Date")

    return df.dropna()


df = load_df("Datasets/MSFT.csv")
In [3]:
df
Out[3]:
Price
Date
1986-03-13 0.060396
1986-03-14 0.062553
1986-03-17 0.063632
1986-03-18 0.062014
1986-03-19 0.060936
... ...
2023-11-01 346.070007
2023-11-02 348.320007
2023-11-03 352.799988
2023-11-06 356.529999
2023-11-07 360.529999

9491 rows × 1 columns

In [4]:
import matplotlib.pyplot as plt

plt.figure(figsize=(10,6))
plt.plot(df.index, df["Price"])
plt.title("Full Dataset")
plt.show()
No description has been provided for this image
In [5]:
# Choose the amount of days to consider from the dataset
days = 5000 # ~13 years

# numbers of days to consider in the input of the model
lookback = 15 #


def df_to_windowed(fullDF, n=lookback, daysSelected=days):
    """
    Create a windowed Dataframe (converting into a supervised problem).
    Therefore, the last {lookback} days prices will be the (input)
    and will generate the next day price (output)
    """

    tmp_df = pd.DataFrame()
    for i in range(n, 0, -1):
        tmp_df[f"Last-{i} Price"] = fullDF["Price"].shift(periods=i)
    tmp_df["Price"] = fullDF["Price"]

    return tmp_df.dropna()[-daysSelected:]


windowed_df = df_to_windowed(df)
In [6]:
windowed_df
Out[6]:
Last-15 Price Last-14 Price Last-13 Price Last-12 Price Last-11 Price Last-10 Price Last-9 Price Last-8 Price Last-7 Price Last-6 Price Last-5 Price Last-4 Price Last-3 Price Last-2 Price Last-1 Price Price
Date
2003-12-29 16.282078 16.445030 16.532766 16.664377 16.676914 16.701979 16.758381 16.958931 16.946400 17.172014 17.146950 17.034134 17.015335 16.946400 17.052935 17.209623
2003-12-30 16.445030 16.532766 16.664377 16.676914 16.701979 16.758381 16.958931 16.946400 17.172014 17.146950 17.034134 17.015335 16.946400 17.052935 17.209623 17.247215
2003-12-31 16.532766 16.664377 16.676914 16.701979 16.758381 16.958931 16.946400 17.172014 17.146950 17.034134 17.015335 16.946400 17.052935 17.209623 17.247215 17.153221
2004-01-02 16.664377 16.676914 16.701979 16.758381 16.958931 16.946400 17.172014 17.146950 17.034134 17.015335 16.946400 17.052935 17.209623 17.247215 17.153221 17.203354
2004-01-05 16.676914 16.701979 16.758381 16.958931 16.946400 17.172014 17.146950 17.034134 17.015335 16.946400 17.052935 17.209623 17.247215 17.153221 17.203354 17.635786
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2023-11-01 332.420013 331.160004 327.730011 332.640015 332.059998 330.109985 331.320007 326.670013 329.320007 330.529999 340.670013 327.890015 329.809998 337.309998 338.109985 346.070007
2023-11-02 331.160004 327.730011 332.640015 332.059998 330.109985 331.320007 326.670013 329.320007 330.529999 340.670013 327.890015 329.809998 337.309998 338.109985 346.070007 348.320007
2023-11-03 327.730011 332.640015 332.059998 330.109985 331.320007 326.670013 329.320007 330.529999 340.670013 327.890015 329.809998 337.309998 338.109985 346.070007 348.320007 352.799988
2023-11-06 332.640015 332.059998 330.109985 331.320007 326.670013 329.320007 330.529999 340.670013 327.890015 329.809998 337.309998 338.109985 346.070007 348.320007 352.799988 356.529999
2023-11-07 332.059998 330.109985 331.320007 326.670013 329.320007 330.529999 340.670013 327.890015 329.809998 337.309998 338.109985 346.070007 348.320007 352.799988 356.529999 360.529999

5000 rows × 16 columns

In [7]:
windowed_df["Price"].describe()
Out[7]:
count    5000.000000
mean       80.652762
std        92.782729
min        11.327569
25%        20.142975
50%        30.797449
75%       102.435174
max       360.529999
Name: Price, dtype: float64
In [8]:
def split_xy(windowedNP):
    """
    Split np.array into X and y
    """

    X = windowedNP[:, :-1]
    y = windowedNP[:, -1]
    return (X, y)

Standardization (Padronização)¶

Standardization != Normalization

Padronização != Normalização

In [9]:
from IPython import display
display.Image("Images/Standardization.png")
Out[9]:
No description has been provided for this image
In [10]:
from sklearn.preprocessing import StandardScaler

def scale_data(train, vali, test):
    """ Get Scaled Data """ 
    
    scaler = StandardScaler()
    X_train, y_train = split_xy(scaler.fit_transform(train))
    X_vali, y_vali = split_xy(scaler.transform(vali))
    X_test, y_test = split_xy(scaler.transform(test))
    return scaler, [X_train, X_vali, X_test], [y_train, y_vali, y_test]
In [11]:
def descale_data(train, vali, test, pred, scaler):
    """ Get de-Scaled Data """ 
    X_train, y_train = split_xy(train.to_numpy())
    X_vali, y_vali = split_xy(vali.to_numpy())
    X_test, y_test = split_xy(test.to_numpy())
    X_result, y_result = split_xy(scaler.inverse_transform(pred))
    return [y_train, y_vali, y_test, y_result]

Models¶

In [13]:
from tensorflow.keras.models import Sequential
from tensorflow.keras import layers

# model input: (last {lookback} days prices, 1 feature = "price")
models = []

1 Dimensional Convolution:

No description has been provided for this image
In [15]:
display.Image("Images/MaxPooling.png")
Out[15]:
No description has been provided for this image
In [50]:
display.Image("Images/Conv+Max.png")
Out[50]:
No description has been provided for this image
In [51]:
display.Image("Images/LSTM.png")
Out[51]:
No description has been provided for this image
In [16]:
display.Image("Images/Forget_Gate.png")
Out[16]:
No description has been provided for this image
In [17]:
display.Image("Images/Input_Gate.png")
Out[17]:
No description has been provided for this image
In [18]:
display.Image("Images/Output_Gate.png")
Out[18]:
No description has been provided for this image
In [19]:
models.append(
    Sequential([ # CNN+LSTM+Dropout
       layers.Input((lookback, 1)),
       layers.Conv1D(128, kernel_size=3, activation="relu", padding="same"),
       layers.MaxPooling1D(pool_size=2, padding="same"),
       layers.LSTM(128, return_sequences=True),
       layers.Flatten(),
       layers.Dropout(0.3),
       layers.Dense(128),
       layers.Dense(1)
    ]),
)
models[-1].summary()
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 conv1d (Conv1D)             (None, 15, 128)           512       
                                                                 
 max_pooling1d (MaxPooling1  (None, 8, 128)            0         
 D)                                                              
                                                                 
 lstm (LSTM)                 (None, 8, 128)            131584    
                                                                 
 flatten (Flatten)           (None, 1024)              0         
                                                                 
 dropout (Dropout)           (None, 1024)              0         
                                                                 
 dense (Dense)               (None, 128)               131200    
                                                                 
 dense_1 (Dense)             (None, 1)                 129       
                                                                 
=================================================================
Total params: 263425 (1.00 MB)
Trainable params: 263425 (1.00 MB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________
In [20]:
models.append(
    Sequential([ # LSTM
        layers.Input((lookback, 1)),
        layers.LSTM(128, return_sequences=True),
        layers.Flatten(),
        layers.Dropout(0.3),
        layers.Dense(128),
        layers.Dense(1)
    ]),
)
models[-1].summary()
Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 lstm_1 (LSTM)               (None, 15, 128)           66560     
                                                                 
 flatten_1 (Flatten)         (None, 1920)              0         
                                                                 
 dropout_1 (Dropout)         (None, 1920)              0         
                                                                 
 dense_2 (Dense)             (None, 128)               245888    
                                                                 
 dense_3 (Dense)             (None, 1)                 129       
                                                                 
=================================================================
Total params: 312577 (1.19 MB)
Trainable params: 312577 (1.19 MB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________
In [21]:
models.append(
    Sequential([ # CNN
        layers.Input((lookback, 1)),
        layers.Conv1D(128, kernel_size=3, activation="relu", padding="same"),
        layers.MaxPooling1D(pool_size=2, padding="same"),
        layers.Flatten(),
        layers.Dense(128),
        layers.Dense(1)
    ]),
)
models[-1].summary()
Model: "sequential_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 conv1d_1 (Conv1D)           (None, 15, 128)           512       
                                                                 
 max_pooling1d_1 (MaxPoolin  (None, 8, 128)            0         
 g1D)                                                            
                                                                 
 flatten_2 (Flatten)         (None, 1024)              0         
                                                                 
 dense_4 (Dense)             (None, 128)               131200    
                                                                 
 dense_5 (Dense)             (None, 1)                 129       
                                                                 
=================================================================
Total params: 131841 (515.00 KB)
Trainable params: 131841 (515.00 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________
In [22]:
models.append(
    Sequential([ # Rede Neural Simples
        layers.Input((lookback, 1)),
        layers.Flatten(),
        layers.Dense(128),
        layers.Dense(128),
        layers.Dense(1)
    ]),
)
models[-1].summary()
Model: "sequential_3"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 flatten_3 (Flatten)         (None, 15)                0         
                                                                 
 dense_6 (Dense)             (None, 128)               2048      
                                                                 
 dense_7 (Dense)             (None, 128)               16512     
                                                                 
 dense_8 (Dense)             (None, 1)                 129       
                                                                 
=================================================================
Total params: 18689 (73.00 KB)
Trainable params: 18689 (73.00 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________

Model Training¶

Auxilary Functions¶

In [23]:
# For each year, 60% train, 20% validation, 20% test
def sliding_window(windowed, trainSZ=2100, valiSZ=450, testSZ=450, step=20):
    """
    Sliding Window Generator
    """

    for i in range(0, len(windowed) - trainSZ - testSZ - valiSZ + 1, step):
        train_slice = windowed[i : i+trainSZ]
        vali_slice = windowed[i+trainSZ : i+trainSZ+valiSZ]
        test_slice = windowed[i+trainSZ+valiSZ : i+trainSZ+valiSZ+testSZ]
        yield (train_slice, vali_slice, test_slice)
In [24]:
# Plot windows' intervals and count numbers of windows
plot_generator = sliding_window(windowed_df)
plt.figure(figsize=(10,6))
windows_cnt = 0
for train, vali, test in plot_generator:
    plt.axvline(train.index[0], color="tab:gray")
    plt.plot(train.index, train["Price"], color="tab:blue")
    plt.plot(vali.index, vali["Price"], color="tab:orange")
    plt.plot(test.index, test["Price"], color="tab:green")
    plt.axvline(test.index[-1], color="tab:gray")
    windows_cnt += 1;

plt.title(f"Number of Selected Windows: {windows_cnt}")
plt.legend([
    "Training Observations",
    "Validation Observations",
    "Testing Observations",
])
plt.show()
No description has been provided for this image
In [25]:
import numpy as np
from sklearn.metrics import confusion_matrix

def compute_accuracy_and_cm(y_val, y_test, y_pred):
    """
    Computes the accuracy score and the confusion matrix
    For simplicity, zero price change are considered as positive
    """

    sz = len(y_test)
    y_ref = np.append(y_val[-1], y_test)
    
    y_test_label = np.zeros(sz)
    y_pred_label = np.zeros(sz)

    acc = 0
    for i in range(sz):
        y_test_label[i] = 1 if ((y_test[i] - y_ref[i]) >= 0) else -1
        y_pred_label[i]  = 1 if ((y_pred[i] - y_ref[i]) >= 0) else -1

        if y_test_label[i] == y_pred_label[i]:
            acc += 1

    cm = confusion_matrix(y_true=y_test_label, y_pred=y_pred_label)
    return acc/sz, cm
In [26]:
from matplotlib import patches
patienceSelected = 50

def plot_loss_curve(history, model_idx, i, patience=patienceSelected):
    """
    Plots the loss curve for the model fitting process
    """
    
    logs = history.history
    aux_list = [(val, i) for i, val in enumerate(logs['combine_metric'])]
    best = min(aux_list)
    last = len(logs['combine_metric'])

    plt.figure(figsize=(10,6))
    plt.title(f"Loss Curve for: Model {model_idx}, Window {i}")
    plt.plot(logs["loss"], label="Training Loss")
    plt.plot(logs["val_loss"], label="Validation Loss")
    plt.plot(logs["combine_metric"], label="Combined Loss")
    plt.ylabel("Loss")
    plt.xlabel("Epoch")

    plt.axvline(last-1, color="tab:gray", ymax=0.3, linestyle='--')
    plt.axvline(last-patience-1, color="tab:gray", ymax=0.3, linestyle='--')
    plt.axvline(best[1], color="tab:red", ymax=0.3, linestyle='--')
    
    red_patch = patches.Patch(
        color="tab:red", 
        label=f"best epoch={best[1]}")
    
    gray_patch = patches.Patch(
        color="tab:gray", 
        label=f"Early Stop Limits ({last-patience-1}, {last-1})")

    handles, labels = plt.gca().get_legend_handles_labels()
    handles.extend([red_patch, gray_patch])

    plt.legend(handles=handles, loc="upper right")
    plt.show()
In [27]:
def plot_predictions(dates, ys, metrics, model_idx, i):
    """
    Plots the predicted curve, comparing with observation data
    """
    
    dates_train, dates_vali, dates_test = dates
    y_train, y_vali, y_test, y_result = ys
    rmse, mae, mape, r2, acc = metrics
    
    plt.figure(figsize=(10,6))
    plt.plot(dates_train, y_train)
    plt.plot(dates_vali, y_vali)
    plt.plot(dates_test, y_test)
    plt.plot(dates_test, y_result)
    plt.legend([
        "Training Observations",
        "Validation Observations",
        "Testing Observations",
        "Testing Predictions"
    ])
    plt.title(f"Model {model_idx}, Window {i} \n \
              RMSE={rmse:.3f}, MAE={mae:.3f}, MAPE={mape:.3f}, R2={r2:.3f}" )
    plt.show()
In [28]:
from sklearn.metrics import ConfusionMatrixDisplay

def plot_confusion_matrix(cm, metrics, model_idx, i):
    """
    Plots the confusion matrix for the price change classification
    """

    rmse, mae, mape, r2, acc = metrics

    cm_plt = ConfusionMatrixDisplay(cm, 
                                    display_labels=["Positive", "Negative"])
    cm_plt.plot()
    cm_plt.ax_.set(
        title= f"Model {model_idx}, Window {i}, Accuracy={acc:.3f}",
        xlabel= "Predicted Price Change",
        ylabel= "Actual Price Change"
    )
    plt.show()
In [29]:
from keras.callbacks import EarlyStopping , Callback, ModelCheckpoint
import h5py 
 
class CombineCallback(Callback):
    def __init__(self, **kargs):
        super(CombineCallback, self).__init__(**kargs)
    def on_epoch_end(self, epoch, logs={}):
        f = 0.2 # f=vali_factor, 80% training loss, 20% validation loss
        logs['combine_metric'] = f*logs['val_loss']+(1-f)*logs['loss']

combined_cb = CombineCallback()
model_checkpoint = ModelCheckpoint(
    filepath="Models/tmp_best_model.h5", 
    monitor="combine_metric", 
    mode="min", 
    save_best_only=True, 
    save_weights_only=True,
    verbose=False
)
earlyStop = EarlyStopping(monitor="combine_metric", 
                          min_delta=0, 
                          patience=patienceSelected, 
                          mode="min", 
                          verbose=False)

Main Function¶

In [30]:
from sklearn.metrics import mean_squared_error, mean_absolute_error
from sklearn.metrics import r2_score, mean_absolute_percentage_error
from tensorflow.keras.optimizers import Adam

def cross_validation(model, generator, model_idx, flag_plot=0):
    """
    Performs Cross validation for all models and all sliding windows

    Calculates the cross validation score:
        ("RMSE", "MAE", "MAPE", "R2", "Accuracy");

    Accuracy is computed by classifying if the relative price change 
    for day i was positive or negative
    """

    cv_score = pd.DataFrame(columns=["RMSE", "MAE", "MAPE", "R2", "Acc"])

    for i, (train, vali, test) in enumerate(generator):
        # Get Dates = [dates_train, dates_vali, dates_test]
        dates = [i.index for i in [train, vali, test]]

        # Scale data
        scaler, X_sc, y_sc = scale_data(train, vali, test)
        X_train_sc, X_vali_sc, X_test_sc = X_sc
        y_train_sc, y_vali_sc, y_test_sc = y_sc

        # Fit, save best model and Predict
        model.load_weights("Models/empty_model.h5", 
                           skip_mismatch=True, by_name=True)
        
        model.reset_states()
        history = model.fit(
            X_train_sc, y_train_sc,
            validation_data=(X_vali_sc, y_vali_sc),
            epochs=200, # maximum number of epochs
            batch_size=64, # better for jumping local minimas
            verbose=False,
            callbacks=[combined_cb, earlyStop, model_checkpoint]
        )
        model.load_weights("Models/tmp_best_model.h5",
                           skip_mismatch=True, by_name=True)
        
        preds_sc = model.predict(X_test_sc, verbose=False)

        # Descale data
        stacked_pred = np.hstack((X_test_sc, preds_sc))
        ys =  descale_data(train, vali, test, stacked_pred, scaler)
        [y_train, y_vali, y_test, y_result] = ys

        # Compute Metrics
        rmse = mean_squared_error(y_test, y_result, squared=False)
        mae = mean_absolute_error(y_test, y_result)
        mape = mean_absolute_percentage_error(y_test, y_result)
        r2 = r2_score(y_test, y_result)
        acc, cm = compute_accuracy_and_cm(y_vali, y_test, y_result)

        metrics = [rmse, mae, mape, r2, acc]

        # Plot All Curves and Metrics; Also loss curves
        if flag_plot == 2:
            plot_loss_curve(history, model_idx, i)
            plot_predictions(dates, ys, metrics, model_idx, i)
            plot_confusion_matrix(cm, metrics, model_idx, i)

        # Plot only 5 Curves and Metrics;
        elif (flag_plot == 1 and (i % (windows_cnt//5)) == 0):
            plot_loss_curve(history, model_idx, i)
            plot_predictions(dates, ys, metrics, model_idx, i)
            plot_confusion_matrix(cm, metrics, model_idx, i)

        # Append Result
        cv_score.loc[len(cv_score)] = metrics

    return cv_score

# For each model, perform a cross validation training,
# plot graphs and compute metrics if wanted
cv_scores = []
for i, model in enumerate(models):
    model.compile(
        loss="mean_squared_error",
        optimizer=Adam(learning_rate=0.0001)
    )   
    model.save_weights("Models/empty_model.h5")
    generator = sliding_window(windowed_df)
    cv_score = cross_validation(model, generator, i, 1)
    cv_scores.append(cv_score)
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image

Results¶

In [31]:
# Output summary (mean, std, min, max)
for i, cv_score in enumerate(cv_scores):
    print(f"Model {i}")
    print(cv_score.describe(), "\n\n")
Model 0
             RMSE         MAE        MAPE          R2         Acc
count  101.000000  101.000000  101.000000  101.000000  101.000000
mean     4.236400    3.355055    0.022731    0.957862    0.515270
std      3.065649    2.524720    0.012100    0.029890    0.043998
min      0.635295    0.433698    0.008721    0.854172    0.428889
25%      1.334520    1.040699    0.014059    0.934139    0.488889
50%      4.876371    3.997817    0.016660    0.967194    0.513333
75%      5.784630    4.645553    0.034697    0.980976    0.560000
max     11.643655    9.954925    0.048190    0.994521    0.588889 


Model 1
             RMSE         MAE        MAPE          R2         Acc
count  101.000000  101.000000  101.000000  101.000000  101.000000
mean    13.780756   12.036860    0.082346    0.500535    0.542948
std      9.056505    8.339012    0.030284    0.300786    0.032229
min      2.019087    1.454082    0.019060   -0.640108    0.482222
25%      5.270709    4.347477    0.058057    0.349083    0.511111
50%     14.465550   12.440581    0.078015    0.532314    0.555556
75%     20.402684   17.807823    0.107990    0.727192    0.571111
max     38.873368   36.749581    0.149293    0.963882    0.588889 


Model 2
             RMSE         MAE        MAPE          R2         Acc
count  101.000000  101.000000  101.000000  101.000000  101.000000
mean     2.675261    2.006761    0.014955    0.978179    0.505457
std      1.673573    1.289667    0.005663    0.021644    0.042148
min      0.600988    0.399974    0.008164    0.839600    0.411111
25%      1.110362    0.894583    0.012047    0.970991    0.477778
50%      1.865197    1.508048    0.014228    0.986992    0.506667
75%      4.044716    3.041439    0.015414    0.989743    0.537778
max      5.660837    4.464485    0.048418    0.995586    0.586667 


Model 3
             RMSE         MAE        MAPE          R2         Acc
count  101.000000  101.000000  101.000000  101.000000  101.000000
mean     2.636605    1.952849    0.014096    0.979810    0.513971
std      1.712076    1.303370    0.004731    0.020183    0.049396
min      0.619184    0.433852    0.007427    0.858374    0.413333
25%      0.943867    0.734659    0.011343    0.974167    0.486667
50%      2.279445    1.671087    0.013157    0.987064    0.513333
75%      4.094996    3.010350    0.015380    0.990371    0.560000
max      5.624470    4.402394    0.035729    0.996054    0.600000 


In [32]:
# Section by Parameter
metrics = ["RMSE", "MAE", "MAPE", "R2", "Acc"]

evaluation = {
    "RMSE": {},
    "MAE": {},
    "MAPE": {},
    "R2": {},
    "Acc": {},
}

for model_idx, cv_score in enumerate(cv_scores):
    for param in metrics:
        evaluation[param][f"Model {model_idx}"] = cv_score[param].mean()

def plot_metric(param, logScale=False):
    """
    Plots the evaluation metrics, comparing each model
    """

    if logScale:
        plt.figure(figsize=(10,6))
        plt.title(param + " in Log Scale")
        plt.bar(list(evaluation[param].keys()), list(evaluation[param].values()), color="tab:orange")
        plt.xlabel("Models")
        plt.ylabel("Metrics Value")
        plt.yscale('log')
        plt.show()

    else:
        plt.figure(figsize=(10,6))
        plt.title(param)
        plt.bar(list(evaluation[param].keys()), list(evaluation[param].values()))
        plt.xlabel("Models")
        plt.ylabel("Metrics Value")
        plt.show()
In [33]:
display.Image("Images/RootMeanSquareError.png")
Out[33]:
No description has been provided for this image
In [34]:
plot_metric("RMSE")
No description has been provided for this image
In [35]:
plot_metric("RMSE", 1)
No description has been provided for this image
In [36]:
display.Image("Images/MeanAbsoluteError.png")
Out[36]:
No description has been provided for this image
In [37]:
plot_metric("MAE")
No description has been provided for this image
In [38]:
plot_metric("MAE", 1)
No description has been provided for this image
In [39]:
display.Image("Images/MeanAbsolutePercentageError.png")
Out[39]:
No description has been provided for this image
In [40]:
plot_metric("MAPE")
No description has been provided for this image
In [41]:
plot_metric("MAPE", 1)
No description has been provided for this image
In [42]:
display.Image("Images/R2-DeterminationCoefficient.png")
Out[42]:
No description has been provided for this image
In [43]:
plot_metric("R2")
No description has been provided for this image
In [44]:
plot_metric("R2", 1)
No description has been provided for this image
In [45]:
display.Image("Images/Accuracy.png")
Out[45]:
No description has been provided for this image
In [46]:
plot_metric("Acc")
No description has been provided for this image
In [47]:
plot_metric("Acc", 1)
No description has been provided for this image
In [48]:
# Output complete results   
for i, cv_score in enumerate(cv_scores):
    print(f"Model {i}")
    print(cv_score)
Model 0
         RMSE       MAE      MAPE        R2       Acc
0    0.635295  0.433698  0.011143  0.967585  0.533333
1    0.659344  0.465437  0.011741  0.970691  0.497778
2    0.773575  0.547821  0.013379  0.962671  0.506667
3    0.809641  0.577303  0.013925  0.958412  0.520000
4    0.830475  0.594897  0.014164  0.956718  0.506667
..        ...       ...       ...       ...       ...
96   5.784630  4.549013  0.016660  0.970014  0.493333
97   5.880483  4.645553  0.016900  0.972076  0.491111
98   5.960980  4.699542  0.017123  0.970983  0.508889
99   5.807565  4.578021  0.016677  0.972225  0.488889
100  5.740368  4.539392  0.016489  0.974650  0.486667

[101 rows x 5 columns]
Model 1
          RMSE        MAE      MAPE        R2       Acc
0     2.211983   1.826473  0.045304  0.607029  0.506667
1     2.655621   2.098191  0.050468  0.524547  0.506667
2     2.619101   2.100613  0.049882  0.572100  0.506667
3     3.341452   2.866804  0.068010  0.291635  0.504444
4     2.019087   1.454082  0.033128  0.744162  0.482222
..         ...        ...       ...       ...       ...
96   21.359799  19.012744  0.067211  0.591149  0.500000
97   17.467031  14.649426  0.051403  0.753631  0.491111
98   29.589813  25.999415  0.090285  0.285015  0.484444
99    6.622615   5.254152  0.019060  0.963882  0.493333
100  11.587038   9.857194  0.034388  0.896712  0.497778

[101 rows x 5 columns]
Model 2
         RMSE       MAE      MAPE        R2       Acc
0    0.600988  0.399974  0.010302  0.970991  0.491111
1    1.257667  1.119494  0.028087  0.893363  0.493333
2    0.633659  0.422459  0.010480  0.974953  0.506667
3    0.685503  0.469423  0.011507  0.970187  0.508889
4    0.829467  0.624175  0.015021  0.956823  0.495556
..        ...       ...       ...       ...       ...
96   5.564676  4.395928  0.016059  0.972251  0.504444
97   5.636228  4.451804  0.016168  0.974348  0.504444
98   5.660837  4.464485  0.016234  0.973832  0.502222
99   5.575831  4.397707  0.015991  0.974398  0.511111
100  5.550863  4.390614  0.015910  0.976296  0.497778

[101 rows x 5 columns]
Model 3
         RMSE       MAE      MAPE        R2       Acc
0    0.769167  0.575990  0.014757  0.952484  0.493333
1    0.918224  0.742942  0.018676  0.943158  0.493333
2    0.843463  0.653367  0.016164  0.955622  0.493333
3    1.494096  1.342167  0.032920  0.858374  0.504444
4    0.704799  0.491151  0.011900  0.968827  0.513333
..        ...       ...       ...       ...       ...
96   5.517242  4.299079  0.015730  0.972722  0.515556
97   5.602514  4.370810  0.015882  0.974654  0.524444
98   5.624470  4.402394  0.015994  0.974167  0.506667
99   5.554105  4.336411  0.015772  0.974597  0.535556
100  5.540707  4.350271  0.015768  0.976382  0.533333

[101 rows x 5 columns]